Methods for the Classification of Data from Open-Ended Questions in Surveys

Disputation
16 April 2024

Camille Landesvatter

University of Mannheim

Research Questions

Which methods can we use to classify data from open-ended survey questions?
Can we leverage these methods to make empirical contributions to substantial questions?

Motivation

1️⃣ Increase in methods to collect natural language (e.g., smartphone surveys with voice technologies) requires the evaluation of available classification methods.

2️⃣ Special structure of open-ended survey answers (e.g., shortness, lack of context) requires the testing of machine learning methods for the survey context.

  • Fully manual: no automation
  • Semi-automated: supervised ML, pre-trained models, prompt-based learning
  • Fully automated: unsupervised ML, clustering methods, topic models

3️⃣ Open answers have the potential to equip researchers with rich data useful for various subjects of research.

Overview of Studies

Study 1 Study 2 Study 3
How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning Open-ended survey questions: A comparison of information content in text and audio response formats Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?
Research Fields
Measurement equivalence Questionnaire Design Emotion Analysis

Study 1:
“How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning”
(Published in Sociological Methods & Research)

Study 1: Characteristics

  • Background: ongoing debates about which type of trust survey researchers are measuring with traditional survey items (i.e., equivalence debate cf. Bauer & Freitag 2018)

  • Research Question: How valid are traditional trust survey measures?

  • Questionnaire Design: 5 open-ended questions per respondent, block-randomized order

  • Data: U.S. non-probability sample; \(n\)=1,500 with 7,497 open answers

Study 1: Methodology

Figure 1: Supervised Classification for a Trust Question.

Supervised classification approach:

    1. manual labeling of randomly sampled documents (n=[1,000/1,500])
    1. fine-tuning the weights of two BERT1 models, using the manually coded data as training data, to classify the remaining n=[6,500/6,000]

Study 1: Results

ID Measure Trust Probing Answer Associations (known others) Associations (sentiment)
123 Most people 0.33 I was thinking of people I don’t know personally. 0 (No) 0 (neutral/positive)
3139 Most people 0.17 Tourists that come to our little village. I tend to be very wary of them. 0 (No) 1 (negative)
2980 Stranger 0 No one in particular, but I don’t think I could trust anyone ever again. 0 (No) 1 (negative)
4286 Watching a loved one 0 A former neighbor of mine who was a single father with a son close to my son’s age. 1 (Yes) 0 (neutral/positive)
Table 2: Illustration of exemplary data. Note: n=7,497.

Figure 2: Associations and Trust Scores. Note. CIs are 95% and 90%.

Study 2:
“Open-ended survey questions: A comparison of information content in text and audio response formats”
(Submitted to Public Opinion Quarterly)

Study 2: Characteristics

  • Background: requests for spoken answers are assumed to trigger an open narration with more intuitive and spontaneous answers (e.g., Gavras et al. 2022)

  • Research Question: Are there differences in information content between responses given in voice and text formats?

  • Experimental Design: random assignment into either the text or voice condition

Study 2: Methodology

  • Operationalization of information content in open answers via application of measures from information theory and machine learning

    • response length, number of topics, response entropy
  • Questionnaire Design: 9 open-ended questions per respondent, block-randomized order

  • Data: U.S. non-probability sample; \(n\)=1,461 with \(n_{text}\)=800 and \(n_{audio}\)=661

    • average item non-response rate text: 1%
    • average item non-response rate audio: 53%

Study 2: Results

Figure 3: Information Content Measures across Questions.
Note. CIs are 95%, n_vote-choice: 830 (audio: 225, text: 605), n_future-children: 1,337 (audio: 389, text: 748)

Study 3:
“Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?”
(Submitted to American Political Science Review)

Study 3: Characteristics

  • Background: conventional notion stating that trust originates from informed, rational, and consequential judgments is challenged by the idea of an “affect-based” form of (political) trust (e.g., Theiss-Morse and Barton 2017)

  • Research Question: Are individual trust judgments in surveys driven by affective rationales?

  • Questionnaire Design: voice condition only

  • Data: U.S. non-probability sample; \(n\)=1,474 with 491 audio open answers

Study 3: Methodology

Figure 4: Methods for Sentiment and Emotion Analysis.

Study 3: Results

Figure 5: Emotions in Speech Data from SpeechBrain.
Note. n_neutral=408, n_anger=44, n_sadness=18, n_happiness=21. Reference category (right): neutral.

Summary and Conclusions

  • Web surveys allow to collect narrative answers that provide valuable insights into survey responses
    • think aloud, associations, emotions, tonal cues, additional info, etc.
  • New technologies (e.g., speech recognition) allow innovative data collection
  • Analyzing natural language can inform various debates, e.g.:
    • Study 1: equivalence debate in trust research (cf. Bauer & Freitag 2018)
    • Study 2: oral response formats in web surveys (cf. Gavras et al. 2022)
    • Study 3: cognitive-versus-affective debate in political trust research (cf. Theiss-Morse and Barton 2017)
    • Study 1-3: item and data quality in general (e.g., associations, information content, sentiment, emotions)

Summary and Conclusions

Semi-automated methods for open survey answers

  • supervised machine learning requires sufficient and high-quality training data (i.e., labeled examples)
  • LLMs allow modeling with fewer training data and domain-specific knowledge (fine-tuning and prompting techniques)

    • E.g., Study 1: BERT outperforms Random Forest with 1,500 labeled examples

Summary and Conclusions

  • But LLMs suffer from high complexity and limited transparency
    • start with simple methods and evaluate
      • Study 1: Random Forest → BERT
      • Study 3: dictionary approach → deep learning
    • trade off between accuracy and explainability
  • Fully manual, semi-automated, or fully automated?
    • task difficulty, sample size, structure of the answers, state of previous research, accuracy and transparency, available resources

Thank you for your Attention!

References

Bauer, P. C., and M. Freitag. 2018. “Measuring Trust.” Pp. 1–27 in The Oxford Handbook of Social and Political Trust, edited by E. M. Uslaner. Oxford University Press.

Gavras, K. et al. 2022. “Innovating the collection of open-ended answers: The linguistic and content characteristics of written and oral answers to political attitude questions.” Journal of the Royal Statistical Society. Series A, 185(3):872-890.

Pérez, J. et al. 2023. “Pysentimiento: A Python Toolkit for Opinion Mining and Social NLP Tasks.” arXiv.
Ravanelli, M. et al. 2021. “SpeechBrain: A General-Purpose Speech Toolkit.” arXiv

Theiss-Morse, E., and D. Barton. 2017. “Emotion, Cognition, and Political Trust.” Pp. 160–75 in Handbook on Political Trust. Edward Elgar Publishing.